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Abstract 

Both in the plane and in space, we invert the nonlinear Ullman transformation 
for 3 points and 3 orthographic cameras. While Ullman's theorem assures a unique 
reconstruction modulo a reflection for 3 cameras and 4 points, we find a locally 
unique reconstruction for 3 cameras and 3 points. Explicit reconstruction formulas 
allow to decide whether picture data of three cameras seeing three points can be 
realized as a point-camera configuration. 

1 Introduction 

Ullman's theorem in computer vision is a prototype of a structure from motion 

result. Given m planes in space and n points for which we know the orthogonal 
projections of the points on the planes, we want to recover the planes and the points. 
The problem can also be formulated as follows: given a fixed orthographic camera, 
and a point configuration which undergoes a rigid transformation. Taking m pic- 
tures of this rigid n-body motion, how do we reconstruct the body as well as 
its motion? Ullman's theorem is often cited as follows: "For rigid transformations, 
a unique metrical reconstruction is known to be possible from three orthographic 
views of four points" 

While 3 points in general position can be reconstructed from 2 orthographic 
projections, if the image planes are known, one needs 3 views to recover also the 
camera parameters. While Ullman's theorem states four points, three points 
are enough for a locally unique reconstruction. Actually, already Ullman's proof 
demonstrated this. We produce algebraic inversion formulas in this paper. Ullman's 
transformation is a nonlinear polynomial map which computer algebra systems is 
unable to invert. Ullman's proof idea is to reconstruct the intersection lines of the 
planes first, computer algebra systems produce complicated solution formulas be- 
cause quartic polynomial equations have to be solved. Fortunately, it is possible to 
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reduce the complexity. 

The fact that four points produce an overdetermined system has been described 
by Ullman as follows: "the probability that three views of four points not moving 
rigidly together will admit a rigid interpretation is low. In fact, the probability is 
zero." ([11] Ullman section 4.4 p. 149.) In other words, for a given 3 sets of 4 
points in the plane, a reconstruction of a 3D scene with 4 points and 3 cameras is in 
general not possible. Indeed, the camera-point space has a much lower dimension 
than the space of photo configurations. Assuming 3 cameras, it is for 3 points, 
that the number of unknowns matches the number of equations. Let's look at a 
simple dimensional analysis for three points and three cameras. See |8j for arbitrary 
cameras. One point can be fixed at the orign and one camera can be placed as 
the xy-plane. Because a general rotation in space needs 3 parameters, there are 3 
unknowns for each camera. Because the projections onto the first camera already 
give the x, y coordinates of the planes, we only need to know the heights of the 
points. This leaves us with 8 unknowns: C = (£i,.Z2,0i,<£i,7i,#2j<fo>7fc)- Every 
point-camera configuration produces 2 real coordinates Qj(Pi) so that two cameras 
provide us with a total 8 of data points. Let Rj denote the rotation matrices defined 
by the angles 9j, <f>j,"lj, let Pj = RjO-i 0> 0), qj = Rj(0, 1, 0) be the basis in the camera 
planes and denote by Pi = (xi,yi,Zi) the points. The nonlinear structure from 
motion transformation F from i? 8 to i? 8 is 

F (C) = (P2 ■ P2, 12 ■ P2,P2 ■ P3, <?2 • P3,P3 ■ P2, <?3 " F 2,P3 • P3, 93 • ^3) • 

It needs to be inverted explicitely on the image {F(C) | det(F)(C) 7^ }. Because 
F is not surjective, it is an interesting question to characterize the image data which 
allow a reconstructions. By the implicity function theorem, the boundary of this 
set is {F(C) I det(F)(C) = }. 



Figure 1 Given three planes and three points, the map F produces projections of the 
points on the planes. The problem is to rebuild the planes and the points from these 
data. The nonlinear map F from camera point configurations to the photographic 
data space is finite to one. The set for which {det(F) = 0} is mapped to the boundary 
of the image which is a proper subset of all possible photographic data. 

As Ullman has pointed out, the reconstruction is not unique: changing the signs 
of Pi, pi and qi does not change the image point. This is the case for any number 
of points Pi and any number of cameras spanned by a vector pair (qi,Pi) because 
the image data (Pi ■ qj, Pi - pj) from which we want to recover Pi,Pj, qj are the same 
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if Pi,Pj, qj are replaced by —Pi, —pj, —qj- So, even with arbitraryly many cameras, 
structure is never uniquely recoverable. Less ambiguities occur with a four'th point 
P4 as Ullman has shown. While adding this four'th point reduces the number of 
mirror ambiguities, it adds constraints so that almost all image data are unreal- 
izable. Indeed, a reconstruction is only possible on a codimension 3 manifold of 
photographic data situations. 

Historically, Ullman's theorem is a key result. It provides a link between com- 
puter vision, psychology, artificial intelligence and geometry. We should point out 
that while Ullman's setup is very familiar to computer aided design CAD, where 
an engineer works with three views of the three-dimensional object, there is an es- 
sential difference with structure from motion: unlike in CAD, we do not know the 
cameras and finding them is part of the reconstruction problem. 

2 Ullman's theorem in two dimensions 

The two-dimensional Ullman problem is interesting by itself. The algebra is simpler 
than in three dimensions but it is still not completely trivial. The two dimensional 
situation plays an important role in the 3 dimensional problem because the three 
dimensional situation reduces to it if the three planes have coplanar normal vectors. 
Let's first reformulate the two-dimensional Ullman theorem in a similar fashion as 
Ullman did. A more detailed reformulation can be found at the end of this section. 

Theorem 2.1 (Two dimensional Ullman theorem) In the plane, three differ- 
ent orthographic affine camera images of three noncollinear points determines both 
the points and cameras in general up to a reflections at the camera planes. 



Figure 2 The setup for the structure of motion problem with three orthographic 
cameras and three points in two dimensions. One point is at the origin, one camera 
is the x-axis. The problem is to find the y coordinates of the two points as well as 
the two camera angles from the scalar projections onto the lines. 

Proof. With the first point Pi at the origin (0, 0), the translational symmetry of the 
problem is fixed. Because cameras can be translated without changing the pictures, 
we can assume that all camera planes go through the origin (0,0). By having the 
first camera as the x-axis, the rotational symmetry of the problem is fixed. We are 
left with 6 unknowns, the y-coordinates of the two points (xi,yi) and the directions 
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Vi = (cos(Qj), sin(aj)) of the other cameras. Because the x-coordinates of the points 
can directly be seen by the first camera, only 4 unknowns remain: yi, y 2 , ai, a 2 . We 
know the image data a^-, the scalar component of the projection of the point onto 
the camera. This is what the photographer j sees from the point Pj. We have the 
equations 

&ij — P{ ' Vj 

which are nonlinear in the angles on for given x\,x 2 - The problem is to invert the 
structure from motion map: 
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on R 4 . It has the Jacobian determinant 

det(DF) = sin(ai) sin(a 2 ) sin(ai — a 2 )(xiy 2 — x 2 yi) . 

We see that it is locally invertible on the image if and only if the three cameras are 
all different and if the three points are not collinear. Because changing the signs of 
Pi and Vj does not alter the values a%j = Pi ■ Vj seen by the photographer, we have a 
global reflection ambiguity in the problem. We will give explicit solution formulas 
below. □ 



Can this result be improved? What is the image of the map F and how can one 
characterize triples of pictures for which a reconstruction is possible? 

To see whether the result is sharp, lets look at the dimensions. In dimension d, 
an affine orthographic camera is determined by dim(SOd) = d(d— l)/2 parameters 
and a point by d parameters. We gain (d — 1) coordinates for each point-camera 
pair. The global Euclidean symmetry of the problem - rotating and translating the 
point-camera configuration does not change the pictures - gives us the structure 
from motion inequality for orthographic cameras 

nd + m[d(d-l)/2 + (d-l)(d-2)/2] < (d - l)nm + d + d(d - l)/2 

which for m = 3 and d = 2 reduces to 2n + 6 < 3n + 2 + 1 showing that n = 3 is 
sharp. See [8] for more details. 



Figure 3 The set of dimension pairs (n, m) for which a reconstruction is not pos- 
sible by the structure from motion inequality. We see that the point (n, m) = (3, 3) 
is the only point, where we have equality. 

Lemma 2.2 If the points are collinear, a nontrivial deformation of the point-camera 
configurations is possible with arbitrarily many points and arbitrarily many cameras. 

Proof. The first point is fixed at the origin Pi = (0,0). Let P2(t) = (1, 1 + 4). Draw 
arbitrarily many cameras for t = 0. If the point P2(t) is changed, the cameras can 
be rotated so that the scalar projection of Pzft) onto the lines stay the same. Also 
the pictures of any scalar multiple Pk(t) = A^i-^) stay the same. □ 




Figure 4 For arbitrarily many orthographic affine cameras and arbitrarily many 
points, there are ambiguities if the points are collinear. The picture shows 3 cameras 
and 6 points. The deformation of one camera plane deformes the points which in 
turn forces an adjustment of the other camera planes. 

It is also not possible to reduce the number of cameras to two cameras, stereo 
vision. 

Lemma 2.3 With two cameras, a deformation is possible with arbitrarily many 
points. 

Proof. One camera is the x-axes. Take n points. We can move them so that the 
projection onto the x-axis stays the same and so that the scalar projection to the 
second camera keeps the same distance from the origin. 
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Figure 5 With two cameras, there are always ambiguities. Fix the first camera as 
the x-axes and let the second camera be the line r(s) = sv. Take image coordinates 
x%, . . . , x n and s%, . . . , s n . Define Pi as the intersection to the lines x = Xi with lines 
perpendicular to v through r(si). Now, if we deform the second camera by turning 
the vector v, we also have a deformation of the points Pi without changing the image 
data. 

Is the nonlinear map F surjective? The answer is not obvious because intuition 
does not help much. We have difficulties to visualize 3 points in the plane and three 
lines, if we know the scalar projections onto the three lines. We tried and failed first 
to prove that the map F is surjective. Indeed, the answer is: no, the map F is not 
surjective. The transformation F maps R 4 to a proper subset of R 4 . The boundary 
is the image of the set det(-DF) = which is the set of situations, where points are 
collinear or a camera collision happens, or a point collision happens. Below we will 
look at explicit configurations which are not in the image. 

How do we invert the structure from motion map F? Let us reformulate the 
problem in index-free notation. Assume the point coordinates are (u,p), (v,q) and 
the camera angles are a, (3. The picture of the first point has coordinates (a, b) and 
the picture of the second point has coordinates (c, d). We want to find a,(3,p,q 
from the equations 



(The scalars p, q have no relation with the vectors p and q used in the three dimen- 
sional problem mentioned in the introduction). After eliminating p and q, we get 
a, (3 as solutions of the two equations 



itcos(a) -hpsin(a) 
ucos{(5) +psin(/3) 
v cos(a) + q sin(a) 
v cos(/3) + q sin(/3) 



a 



c 



b 



d . 



sin (a) 



sin(/3) 



a — -ucos(a) 
sin(a) 



c — ncos(/3) 
sin(/3) 



b — v cos (a) 



d — v cos(/3) 
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Solutions as intersections of level curves of two functions on the two dimensional 
torus. With x = cos(a),y = cos(/3) we get 

(1 — x 2 )(a — uy) = (l — y 2 )(c — ux) 
(l-x 2 )(b-vy) = (1 - y 2 ){d - vx) . 

This system of quadratic equations has explicit solutions. They are the first two of 
the following set of 4 solution formulas. The p and q can be obtained directly. 



cos a 



y = cos(/3) 



(-c 2 + d 2 )u 2 + 2acuv - 2bd(ac + uv) + a 2 (d 2 - v 2 ) + b 2 (c 2 + v 2 ) 

2(bc - ad)(-du + bv) 
(c 2 - d 2 )u 2 - 2acuv + 2bd(-ac + uv) + b 2 (c 2 - v 2 ) + a 2 (d 2 + v 2 ) 



2{bc — ad){cu — av) 
p = (a — ucos(a))/sin(a) 
q = (b — v cos (a)) /sin (a) . 

We see that there are 0, 1 or 2 real solutions and that the only ambiguities are 
(a,(3,p,q) — ► (— a, — (3, — p, — q) because the original equations require a and (3 to 
switch signs simultaneously. □ 




Figure 6 The nonlinear map F is 2 : 1. Here are the two solutions to a typical 
image set. 

3 The image of F 

Lemma 3.1 The image of F is a proper subset of all photographic image data. 

Proof. To see this, lets look at the two-dimensional surface defined by a = 0, d = 
0,b = u. The inverse of F on this set is 

a = arccos(-D) 

P = arccos(l - 2D 2 ) 

p = -bD(l - D 2 y 1/2 

q = (2c 2 -v 2 ){l-D 2 )- 1 / 2 , 

where D = v/2c. The inverse exists for \D\ < 1. For \D\ > 1 the camera and point 
data become complex. 
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The image data (u,v,a,b,c,d) = (1,5,0,1,1,0) for example do not correspond 
to any actual situation of points (p,u), (q,v) and cameras with direction a, f3. It is 
the situation, where the points are seen with the first camera at 1 and 5, by the 
second camera at and 1 and by the third camera at 1 and 0. □ 



Let's look a bit closer at the system of nonlinear equations ([I]). Dividing the 
first to the second shows that x and y are related by a Mobius transform x = Ay. 
A second equation relating a and (3 is obtained from 

sin(a)(c — ucos(/3)) = sin(/?)(a — u cos (a)) 
sin(a)(d — v cos(/3)) = sin(/3)(6 — fcos(a)) 

so that 

sin(a)c — sin(/3)a u 

sin(a)d — sin(/3)6 v 
We see that u/v is the Mobius transform of sin(/3)/ sin(a). Because the inverse of a 
Mobius transform is again a Mobius transform, we know that sin(/3)/ sin (a) is the 
Mobius transform of u/v. This is a real number G. The upshot is that we have two 
equations 

sin(/3) = Gsin(a) 

, m ^4 + -Bcos(a) 

cos(p) = — r -- 

K ' C + Dcos(a) 

for the unknowns a, [3. This quartic equation for cos(/3) has or 2 solutions as the 
explicit formulas show. There is no solution for example, if the Mobius transforma- 
tion maps [—1,1] to an interval disjoint from [—1,1]. 

Remark. Equations (pQ) can also be rewritten with r = u + ip, s = v + iq, z = 
cos(a) — i sin(a), w = cos(/3) — i sin(/3) and complex a, b, c, d as 

rz = a, rw = b,sz = c, sw = d 

where \z\ = l,\w\ = l,Re(r),Re(s),Re(a),Re(fe),Re(c),Re(d) are known. These 
are 8 equations for 8 unknowns. It is a reformulation with more variables but the 
equations look simpler. 



We summarize: 

Theorem 3.2 The structure from motion map F for three points and three cameras 
in the plane maps R 4 onto a proper subset of R . The set det(F) = is the set of 
camera-point configurations for which two cameras are the same or where the three 
points are collinear. This set is maped into the boundary of the data set F(R 4 ). 
The map F is 2 : 1 in general. The only ambiguity is a reflection. The explicit 
reconstruction of the points Pi = (u,p),P^ = (v,q) and camera angles a,/3 is 

(-c 2 + d 2 )u 2 + 2acuv - 2bd(ac + uv) + a 2 (d 2 - v 2 ) + b 2 (c 2 + v 2 ) 

2(bc - ad){-du + bo) 
(c 2 - d 2 )u 2 - 2acuv + 2bd(-ac + uv) + b 2 (c 2 - v 2 ) + a 2 ((f + v 2 ) 
2{bc — ad)(cu — av) 

(a — ucos(a))/ sin(a) 
(b — vcos(a))/ sin(a) . 



P 

q 



if the second point P 2 is captured by the second camera Q 2 at a and by the third 
camera Q3 at b and the third point P3 is seen with the second camera at c and with 
the third camera Q3 at d. 

4 Ullman's theorem in three dimensions 

In three dimensions, the structure from motion map F(P, Q) = Qj(Pi) for n points 
and m cameras is a nonlinear map from R 3n x SO™ — ► R 2nm . For m = 3, n = 3, 
this is a map from R 18 to i? 18 . By fixing the position of the first point and setting 
the first camera as the xy-plane, we have a map from i? 6 x SO 2 — > R 12 Because 
the first camera freezes the x-coordinates of all the points, four of these equations 
are trivial and we have a nonlinear map from R? x SO 2 — ► R 8 . This map from an 
8-dimensional manifold to an 8-dimensional manifold is finite to 1 and difficult to 
invert directly. Computer algebra systems seem unable to do the inversion, even 
when replacing rotation angles by quaternions, which produce polynomial maps. 
The crucial idea of Ullman is to perform the reconstruction in two steps. 
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Figure 7 The set of dimension pairs (n, m) in three dimensions, for which a re- 
construction is not possible by simple dimensional analysis. 

Theorem 4.1 (Ullman theorem in three dimensions for 3 points) In three 
dimensions, three orthographic pictures of three noncollinear points determine both 
the points and camera positions up to finitely many reflections. The correspondence 
is locally unique. 

We assume that the three planes are different and that the three points are 
different. Otherwise, we had a situation with m < 3 and n < 3, where finding the 
inverse is not possible. If the normals to the planes are coplanar, that is when the 
three planes go through a common line after some translation, then the problem 
can be reduced to the two-dimensional Ullman problem. 
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Figure 8 The setup for the structure of motion problem with three orthographic 
cameras and three points in three dimensions. One point is at the origin, one camera 
is the xy-plane. The problem is to find the z-coordinates of the two points as well 
as the three Euler angles for each cameras from the projections onto the planes. 

Because Ullman stated his theorem with 4 points and this result is cited so 
widely [II CO El El El [21 [6l [10] , we give more details to the proof of Ullman for 3 
points. The only reason to add a 4'th point is to reduce the number of ambiguities 
from typically 64 to 2. We will give explicit solution formulas which provide an 
explicit reconstruction with in the case of 3 points. One could write down explicit 
algebraic expressions for the inverse. 

Proof. Again we chose a coordinate system so that one of the cameras is the xy- 
plane with the standard basis qo,po- One of the three points Pi = O is fixed at the 
origin. The problem is to find two orthonormal frames pj, qj in space spanning two 
planes S\ and S2 through the origin and two points P2, P3 from the projection data 



Pi 



qj , h 



Pi ■ Pj ■ 



(1) 



The camera j sees the point Pi at the position (otj,&ti)- Because an orthonormal 
2 frame needs 3 parameters (0j,^ti7i) an d each point in space has 3 coordinates, 
there are 2 • 3 + 2 • 3 = 12 unknowns and 12 equations = Pi ■ qj and b(j = Pi ■ pj, 
i = 1,2, j = 0,1,2. Because the projection to the xy plane is known, there are 4 
variables, which can directly be read off. We are left with a nonlinear system of 8 
equations and 8 unknowns (z±, Z2, #1, <f>i, 71, Q\, 4>2, 72)- Just plug in 



Pj 



Qj 



cos(7j) cos(6j) — cos((f>j) sin(7j) sin(#j) 
— cos(0j) cos(9j) sin(7j) — cos(7j) sin(#j) 
sm(jj) sm(4>j) 

cos(9j) sin(7j) + cos(tj) cos(</>j) sin(6*j) 
cos(7j) cos(cj)j) cos(6jg) — sin(7j) sin(#j) 
— cos(7j) sin(0j) 



and Pi = (xi,yi, Zi) into equations ([T]). The determinant of the Jacobean matrix can 
be computed explicitely. It is a polynomial in the 2 unknown position variables z\, z<i 
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and a trigonometric polynomial in the 6 unknown camera orientation parameters 
01, 01) 7l> #2, 02) 72 = 

det(J) = sin 2 (0i ) 

• sin 2 (02 ) 

• (Acos(0i)+Bsin(0i)) 

• (Acos(0 2 ) + 5sin(0 2 )) 

• (cos(0 2 )sin(0i)(Acos(0i) + £sin(0i)) 

+ sm(0 2 )(Dsin(0i)sin(0i-0 2 ) + cos(0 1 )(C'cos(0 2 )-Ssin(0 2 )))) , 

where 

A- = (2/2*1 - yiz 2 ), B = (x 2 z 1 - x 1 z 2 ),C = (yiz 2 - y 2 zi),D = (y x x 2 - xiy 2 ) . 

In general, this determinant is nonzero and by the implicit function theorem, the 
reconstruction is locally unique. 

The main idea (due to Ullman) for the actual inversion is to first find vectors 
Uij in the intersection lines of the three planes. For every pair of two cameras, 
the intersection line can be expressed in two ways: 

aijPi + fcjqi = aijPj + Pijqj 

the projection of the two points produces equations 

aijPi ■ Pk + PijQi ■ Pk = aijPj ■ P k + Pijqj ■ Pk ■ 

Because aik = Pi ■ Pk, hk = q%- Pk are known, these are 2 equations for each of the 
three pairs of cameras and each of the 4 unknowns a, t j , , jij , 5ij . Because addi- 
tionally afj + /3fj = 1, 7 4 2 + 5fj = 1, the values of a^, /3y, 7^, Sij are determined. 

On page 194 in the book [Tl], there are only 4 equations needed, not 5 as stated 
there to solve for the intersection lines of the planes. With 5 equations the number 
of ambiguities is reduced. Actually, the Ullman equations with 4 equations have 
finitely many additional solutions which do not correspond to point-camera config- 
urations. They can be detected by checking what projections they produce. 

We aim to find vectors (atij,Pij) in the plane i and coordinates (7f/,<%) in 
the plane j in the intersections of each pair of photographs. Taking the dot 
products with the two points P\ , P 2 gives the equations 

HjUji + SijVjx (2) 
HjU j2 + 5ijV j2 (3) 

7|+4 = 1 - ( 4 ) 

They can be explicitely solved, evenso the formulas given by the computer algebra 
system are too complicated and contain hundreds of thousands of terms. Each of 
the above equations is of the form 

ax + by = cu + dv, ex + fy = gu + hv, x 2 + y 2 = u 2 + v 2 = 1 . 



atijUii + PijVn 

OLijU i2 + PijV i2 

a% + Pi = 1 
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Geometrically, it is the intersection of two three dimensional planes and two three 
dimensional cylinders in four dimensional space. From the first two equations, we 
have 

x = Au + Bv , y = Fu + Gv . 

By writing u = cos(t),v = sin(i) the equation x 2 + y 2 = 1 and replacing cos 2 (i) = 
(cos(2i) + l)/2 sin 2 (i) = (1 - cos(2i))/2, sin(i) cos(t) = sin(2i)/2, we get a quadratic 
equation for cos(2t) which has the solution 



. . - (ST + WVS 2 -T 2 + W 2 ) 
c°s(2i) = ^Tw" 2 

with U = {A 2 +F 2 )/2; V = (B 2 + G 2 )/2; W = (AB+FG); S = U-V; T = U+V-l. 
We see that there are 8 solutions to the equations ([2]). Four of these solutions are 
solutions for which (XijPi + /%% — ctijPj — Pijqj is perpendicular to the plane contain- 
ing the three points. These solutions do not solve the reconstruction problem and 
these branches of the algebraic solution formulas are discarded. There are 4 solu- 
tions to each Ullman equation which lead to solutions to the reconstruction problem. 

Assume we know the three intersection lines in each plane. Because the ground 
camera plane is fixed, we know two of the intersection lines. Let's denote by U and V 
the unit vectors in those lines. We have to find only the third intersection line which 
contains a unit vector X. This vector X = (x, y, z) can be obtained by intersecting 
two cones. Mathematically, we have to solve the system X-U = r, X -V = s,\X\ = 1. 
This leads to elementary expressions by solving a quadratic equation. 

Once we know the intersection lines, we can get the points Pi, P2 by finding the 
intersection of normals lines to the image points in the photographs. 

The Ullman equations have 4 solutions maximally. Because there are three in- 
tersection lines we expect 4 3 = 64 solutions in total in general. 

If the normals to the cameras are coplanar, the problem reduces to a two- 
dimensional problem by turning the coordinate system so that the intersection line 
is the z-axes. This situation is what Ullman calls the degenerate case. After 
finding the intersection line, we are directly reduced to the two-dimensional Ullman 
problem. □ 



The fact that there are solutions to the Ullman equation which do not lead to 
intersection lines of photographic planes could have been an additional reason for 
Ullman to add a 4'th point. Adding a 4'th point reduces the number of solutions 
from 64 to 2 if the four points are noncoplanar but it makes most randomly cho- 
sen projection data unreconstructable. With three points, there is an open and 
algebraically defined set for which a reconstruction is not possible and and open 
algebraically defined set on which the reconstruction is possible and locally unique. 
The boundary of these two sets is the image of the set det(F) = 0. 
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Figure 9 64 solutions to the reconstruction problem in a particular case. 

5 When is the reconstruction possible? 

Given three photographs each showing three points. As usual, we know which points 
correspond. How can we decide whether there is a point-camera configuration which 
realizes this picture? Of course, we have explicit formulas, they do not illustrate 
the geometry very well. 

Define for two complex numbers A, B the interval I (A, B) of possible angles 

,e ie -A, 



arg( 



e ie -B' ' 



where 9 G [0,2tt). 
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Figure 10 The range of angles I(A,B). 

The following lemma deals with the equations which determine the intersection 
lines of the camera planes. 



Lemma 5.1 The equations 



ax + by = cu + dv, ex + fy = gu + hv, x 2 + y 2 = u 2 + v 2 



1 



can be solved for the unknown x, y, u, v for any values of a, b, c, d, e, f, g, h for which 

,c + id. ^,c + id q + ih. 
a ^(—-r) e /(— — z , ?——) 



g + ih 



v a + ib' e + if ' 



Proof. Define p = a + ib,q = c + id,r = e + if,s = g + ih. We look for two complex 
numbers z = x — iy,w = u — iv of modulus 1 such that Re(zp) = Ke(wq), Re(ur) = 
Re(us). Therefore aig(zp — wq) = tt/2, arg(zr — ws) = ir/2. With z = e %e , w = e 1 ^ ', 
this defines two curves on the torus. The solutions are the intersection points. If 
arg(g/s) G I(q/p,s/r), there is a solution to the problem. □ 



6 Final remarks 

Explicit implementations. 

We have implemented the reconstruction explicitely in Mathematica 6, a computer 
algebra system in which it is now possible to manipulate graphics parameters. We 
have programs, which invert the nonlinear equations on the spot, both in two and 
three dimensions. 
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Figure 11 Interactive demonstration of the reconstruction in two and three dimen- 
sions with Mathematica. The user can change each of the image parameters and 
the computer reconstructs the cameras and the points. We will have this programs 
available on the Wolfram demonstration project. 

Higher dimensions. 

How many points are needed in d dimensions for 3 orthographic cameras to locally 
have a unique reconstruction? In d dimensions, an orthographic camera has / = 
d{d — l)/2 + (d — 1) parameters and the global Euclidean symmetry group has 
dimension g = d + d(d — l)/2. The dimension relations are 

nd + mf = (d — l)nm + g 

f = d(d-l)/2+(d-l) 

g = d + d(d-l)/2. 

This gives 



dimension 


n(m) 






n(2) 


n(3) 


n(4) 


dim=2: 


n = (2m — 


3)/(m- 


2) 




3 


3 


dim=3: 


n = (5m — 


6)/(2m - 


-3) 


3 


3 


3 


dim=4: 


n = (9m — 


10)/(3m 


-4) 


4 


4 


4 



In any dimension, there is always a reflection ambiguity. 
Other cameras. 

The structure from motion problem can be considered for many other camera types. 
The most common is the pinhole camera, a perspective camera. In that case, two 
views and 7 points are enough to determine structure from motion locally uniquely, 
if the focal parameter is kept the same in both shots and needs to be determined too. 
We have studied the structure from motion problem for spherical cameras in detail 
in the paper [7] and shown for example that for three cameras and three points in 
the plane a unique reconstruction is possible if both the camera and point sets are 
not collinear and the 6 points are not in the union of two lines. This uniqueness 
result can be proven purely geometrically using Desarques theorem and is sharp: 
weakening any of the three premises produces ambiguities, where the two line am- 
biguity was the hardest to find. 

Other fields. 

The affine structure of motion problem can be formulated over other fields too, and 
not only over the field of reals k = R or complex numbers k = C. The space S 
is a d- dimensional vector space over some field k. A camera is a map Q from N 
to a (d — l)-dimensional linear subspace satisfying Q 2 = Q. A point configura- 
tion {Pi, P2, P n } and a camera configuration {Q±, . . . , Q m } define image data 
Qj(Pi). The task is to reconstruct from these data the points P{ and the cameras 
Qj. If the field k is finite, the structure from motion problem is a problem in a 
finite affine geometry. If the inversion formulas derived over the reals make sense 
in that field, then they produce solutions to the problem. "Making sense" depends 
for example, whether we can take square roots. We might ask the field k to be 
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algebraicalliy complete so that a reconstruction is possible for all image data. 

A question. For orthographic cameras in the plane, the only ambiguities are 
a reflection. One can extend the global symmetry group G so that the map F 
becomes injective. Can one extend the group in three dimensions also to make the 
structure from motion map F globally injective? To answer this, we would need to 
understand better the structure of the finite set F~ 1 (o) if a is in the image of F. 
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